\begin{abstract}
In Web search, a user refines his search several times before he finds
the information he needs. It is very likely that, in the search log,
similar sequences of searches appear many times, as many users had
searched the Web with the same intent.  Precisely interpreting the
intent of the user is difficult, even with the help of the search log:
there might be numerous instances of such intent scattering in small
pieces in the log, but none of them is comprehensive enough to
describe the concept precisely.  This scenario occurs in many
applications. For example, patterns in Web search, Internet traffic,
program execution traces, network events, etc., are often
non-stationary, yet the same patterns recur over time.  In this paper,
we argue that visible patterns are generated by hidden intent or
hidden concepts, and precisely characterizing such concepts is only
possible if we cluster as much data generated by such concepts as
possible and learn from the clustered data as a whole, instead of
learning from a single episode of such concept.  The benefits is
obvious as it enables us not only to better understand the underlying
system that generates the data, but also to recognize future instance
of a concept as soon as it occurs. To achieve this, we introduce a
clustering based approach, where we adopt a novel clustering
criterion, validation error minimization, to ensure that the found
concepts are unique and precise.  We propose a two step algorithm,
which uses enhanced dynamic programming and EM like methods for
clustering.  Experiments show that in benchmark datasets, our approach
achieves the highest accuracy with lowest cost in comparison with the
current best approaches.
\end{abstract}

%% inten intentIn this paper, we are interested in
%% finding accurate description of such ``intentions'' or concepts in
%% historical data.  
%% State-of-the-art approaches are insufficient for this purpose as they
%% keep learning or updating models from the evolving data, and use these
%% impromptu models for online prediction.  In many cases, this proves to
%% be both costly and ineffective --- much time is wasted on re-learning
%% recurring concepts, yet the classifier may remain one step behind the
%% current trend all the time. In this paper, we propose a approach which
%% finds unique concepts in the evolving data by clustering. 
%%  At run time, combining historical concept change patterns
%% and cues provided by an online training stream, we find the most
%% likely current concept and use its corresponding models to classify
%% data in an unlabeled stream.  The primary advantage of the high-order
%% model approach is its high accuracy. Another
%% important benefit is that, unlike state-of-the-art approaches, 
